Population of various countries- analysis

Data provided by The World Bank

Tools used in this raport will be mainly provided by the pandas and matplotlib libraries

1. Data preparation

First, let's take a look at the data we will be analyzing:

Next lets filter out 'Country Groupes' out of the data frame (such as South Asia, Arab Wolrd, etc):

2. Barplots

a) Top 5 most populated

Let's find the 5 most populated countries (most recent data goes back to 2021):

So now let us visualize how populations of these countries were changing over years. For that, we prepare a smaller dataframe, containing only these five countries (we will also remove the last column "Unnamed" full of NaN values).

(And the black&white version):

b) Random country from random year as a 'centroid'

For the next part we will pick one country and year at random. Next, after narrowing down our dataframe to only the chosen year, let us add a new column, containing information on how much difference there is between population of every country and our chosen country. In other words, we'll be examining the absolute value of difference between populations

So now, our chosen country and year are

We then proceed now to adding the new column to our dataframe

Now we'll make a dataframe containing our chosen country and 4 others, whose population in the given year was the smallest, in realtion to the chosen country. Then, we proceed to make yet another plot of population change over years.

(And black&white version):

c) Poland from random year as a 'centroid'

Okay, so now we generate a similar barplot, but this time with Poland as the 'centroid'.

(And the black&white version):

It's actually pretty interesting how Poland was so much ahead, in terms of population, compared to the other four, just to finish last in the year 2021. Truly remarkable what happened, especially in African countries- Sudan and Algieria.

3. Line plots

a) Top 5 populated countries

Let's remind ourselves the dataframe we'll be using for the next plot

b) Population of countries close to the "centroid"

Let's remember ourselves the dataframe for our lineplot (we're not random-selecting the country and year again as we want to simply create a line version for the previous barplot)

c) Population of countries close to Poland

Let's remind ourselves of the dataframe we're going to use. We use Poland as the 'centroid' and once again, we leave 2011 as our centroid year (we do NOT random-generate it)

4. Bubble plots

Now our goal is to create, for previously used data, a coresponding bubble plots, so scatter plots with a 3rd dimension, indicted by the point volume (a bubble). This bubble will provide us with information on population density in the given year

For the population density, we will need actual area of the countries, so that we could then take a population and divide it by the area of the coresponding country. For that, we import another dataset from The World Bank.

We have some missing values. Lets take care of that:

We removed the columns that 'contained' only a missing values. As for the rest missing values, they are now replaced with a '0', which should make sense- we treat such a country as a non-existent.

a) Top 5 populated countries

For every country, lets prepare a separate dataframe, with population, area and density for every year

We will generate bubbles for every 10 years passing from 1961 (up to 2020, where our data ends), since the bubble plot containing as much as 59 bubbles (one for EVERY year), for each of five countries (making it 295 bubbles in total), would be disgusting and illegible.

b) Chosen country as a centroid

c) Poland as a 'centroid'

5. Pie charts

The assignment is to prepare some other type of visualization for our three dataframes. Let that be pie charts (althought it's doubtful that it's a good choice in the case of our data)

What we'll be visualizing is actually the proportion between populations of given countries, dynamically changing over years.

a) Top 5 countries

b) Chosen country as a 'centroid'

c) Poland as a centroid

6. Plot showing some 'strange' behavior

There was a horrible period in modern history of Cambodia. After years of Vietnamese war and proclaiming and American-friendly government, in 1975 Cambodia underwent a coup- the power was taken by communist extremist Pot Pot and his Khmer Rouge army.

The new regime modelled itself on Maoist China during the Great Leap Forward, immediately evacuated the cities, and sent the entire population on forced marches to rural work projects. They attempted to rebuild the country's agriculture on the model of the 11th century, discarded Western medicine, and destroyed temples, libraries, and anything considered Western.

Estimates as to how many people were killed by the Khmer Rouge regime range from approximately one to three million. The most commonly cited figure is two million (about a quarter of the population).This era gave rise to the term Killing Fields, and the prison Tuol Sleng became notorious for its history of mass killing. Hundreds of thousands fled across the border into neighbouring Thailand. The regime disproportionately targeted ethnic minority groups. The Cham Muslims suffered serious purges with as much as half of their population exterminated. Pol Pot was determined to keep his power and disenfranchise any enemies or potential threats, and thus increased his violent and aggressive actions against his people.

The regime period ended in 1978 with an invasion of Vietnamese forces. Pot Pot fled deep into the jungle, where he allegedlly died in 1998, surrounded by his followers.

First, let's take a quick look at Cambodia's population dynamic. There, we can easly see the horrific doings of the Khmer Rouge's regime.

We will now compare the population decline to some other countries. Let's focus only on the period up to year 2000- what's happened then is not that important for this analysis. Note that something strange happened also in Somalia, in the early 90s. That's actually a break of a devastating civil war, which still goes on as of the year 2023.

7. aGantt plot

Gantt plot will be of use to prepare a kind of scheduled 'activities' taking place over time. In this case, we will make a schedule for the 2022/23 academical year at the University of Warsaw

And as for the black&white version: